Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
نویسندگان
چکیده
Increasing attention has been paid to reinforcement learning algo rithms in recent years partly due to successes in the theoretical analysis of their behavior in Markov environments If the Markov assumption is removed however neither generally the algorithms nor the analyses continue to be usable We propose and analyze a new learning algorithm to solve a certain class of non Markov decision problems Our algorithm applies to problems in which the environment is Markov but the learner has restricted access to state information The algorithm involves a Monte Carlo pol icy evaluation combined with a policy improvement method that is similar to that of Markov decision problems and is guaranteed to converge to a local maximum The algorithm operates in the space of stochastic policies a space which can yield a policy that per forms considerably better than any deterministic policy Although the space of stochastic policies is continuous even for a discrete action space our algorithm is computationally tractable
منابع مشابه
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
Satinder Singh AT &T Labs-Research 180 Park Avenue Florham Park, NJ 07932 [email protected] Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no long...
متن کاملReinforcement Learning with Partially Known World Dynamics
Reinforcement learning would enjoy better success on real-world problems if domain knowledge could be imparted to the algorithm by the modelers. Most problems have both hidden state and unknown dynamics. Partially observable Markov decision processes (POMDPs) allow for the modeling of both. Unfortunately, they do not provide a natural framework in which to specify knowledge about the domain dyn...
متن کاملReinforcement Learning for Problems with Hidden State
In this paper, we describe how techniques from reinforcement learning might be used to approach the problem of acting under uncertainty. We start by introducing the theory of partially observable Markov decision processes (POMDPs) to describe what we call hidden state problems. After a brief review of other POMDP solution techniques, we motivate reinforcement learning by considering an agent wi...
متن کاملThe Infinite Regionalized Policy Representation-0.1cm
We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while main...
متن کاملA Reinforcement Learning System Based on State Space Construction Using Fuzzy ART A Reinforcement Learning System Based on State Space Construction Using Fuzzy ART
A new reinforcement learning system using fuzzy ART (adaptive resonance theory) is proposed. In the proposed method, fuzzy ART is used to classify observed information and to construct effective state space. Then, profit sharing is employed as a reinforcement learning method. Furthermore, the proposed system is extended to the hierarchical structures for solving partially observable Markov deci...
متن کامل